Influence Diagnostics for Correlated Data

نویسندگان

  • Martha E. Nunn
  • Brian G. Leroux
چکیده

Correlated or clustered data are frequently encountered in medical and dental research. Although there has been a large amount of research on analytic methods for clustered data, there has been only a limited amount of work in the area of influence diagnostics for these methods. We present a simple approach to influence diagnostics for clustered data that is practical even for very large data sets. By applying a method of partitioning the fixed effects in a linear model with clustered data and exchangeable correlation structure (Scott and Holt, 1982), ordinary least-squares estimation can be used to obtain influence diagnostics with balanced data. Using a series of simulations, we evaluate the performance of the influence diagnostics for a partitioned linear model where data are unbalanced and where cluster deletion is performed as opposed to point deletion. When the correlation structure is exchangeable and the data are unbalanced, these influence measures provide a reasonably high degree of sensitivity for identifying influential observations. This technique may offer a less cumbersome approach to influence diagnostics with clustered data. Introduction The study of influence is an important aspect in any type of regression analysis because not all cases in a set of data are equal participants in determining regression coefficient estimates, statistical testing, and other statistics. In fact, the character of a regression model may be determined by only a few observations within a set of data while the majority of the data are largely ignored. A large body of literature related to influence diagnostics for linear models now exists, and there are several measures now available for detecting various types of influential observations, including leverage points and outliers as well as groups of influential observations within a set of data. Case-deletion diagnostics have reached such popularity in standard linear regression analysis that they are included in most statistical software packages. In addition, several books have been written to further disseminate both the theory and applications of methods for detecting influential observations, including books by Cook and Weisberg (1982), Belsley, Kuh, and Welsch (1983), and Chatterjee and Hadi (1987). Only in recent years have these methods been extended to analysis of correlated data. Local influence methods for influential groups of observations first described by Cook (1986) have been extended to mixed analysis of variance models by Beckman, Nachtsheim, and Cook (1987). Case deletion diagnostics were extended to linear mixed models by Christensen, Pearson, and Johnson (1992). Further extension of case deletion diagnostics to generalized estimating equations was carried out by Preisser and Qaqish (1996), where both cluster-deletion and point-deletion diagnostics were investigated. Banerjee and Frees (1997) investigated case-deletion diagnostics for both generalized linear mixed models and marginal models (GEE). In this paper, we present a simple method for casedeletion diagnostics for linear models with clustered data and an exchangeable working correlation structure. By partitioning the fixed effects of a linear model into within-cluster variation and between-cluster variation, we are able to obtain influence diagnostics based on cluster deletion alone that can be used to obtain information about both influential clusters as well as clusters with influential points within those clusters. In addition, when an exchangeable working correlation is appropriate, we show that the use of ordinary least squares estimators can be used to obtain clusterdeletion diagnostics with reasonable accuracy compared to using generalized least squares estimators. Partitioning Fixed Effects in Linear Models Let us consider the general linear model defined by 0 α α = + + g g Y X e where Xg is a matrix of unpartitioned covariates of dimension N × p − 1, Y is a vector of responses, e is a vector of random errors, 0 α is the intercept, α g is the vector of coefficients for the matrix of unpartitioned covariates, and N is the total number of observations. By partitioning the fixed effects of this general linear model into between-cluster effects and within-cluster effects, the partitioned model can be rewritten as 0 α α α = + + + w w b b Y X X e where Xw is a matrix corresponding to the within-cluster variation, Xb is a matrix corresponding to the betweencluster variation, Y is the vector of responses, e is a vector of random errors, α0 is the intercept, αw is the vector of coefficients for within-cluster variation, and Joint Statistical Meetings Biometrics Section-to include ENAR & WNAR

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-Starting Control Chart and Post Signal Diagnostics for Monitoring Project Earned Value Management Indices

Earned value management (EVM) is a well-known approach in a project control system which uses some indices to track schedule and cost performance of a project. In this paper, a new statistical framework based on self-starting monitoring and change point estimation is proposed to monitor correlated EVM indices which are usually auto-correlated over time and non-normally distributed. Also, a new ...

متن کامل

Identifying influential families using regression diagnostics for generalized estimating equations.

The Generalized Estimating Equations (GEE) is an approach to analyze correlated data. It is applied here to data from an epidemiological study of oesophageal cancer in a high incidence area in China to investigate familial aggregation. Regression diagnostics for mean structures and association structures are used to identify families that influence estimates of these structures. It is shown tha...

متن کامل

Influence Diagnostics for the Weibull Model Fit to Censored Data

Methods for detecting influential observations for the Weibull model fit to censored data are discussed. These methods include: one-step deletion diagnostics, influence functions and curvature diagnostics. Results indicate that the curvature diagnostics may be helpful in detecting masking.

متن کامل

Influence Diagnostics for Linear Mixed Models

Linear mixed models are extremely sensitive to outlying responses and extreme points in the fixed and random effect design spaces. Few diagnostics are available in standard computing packages. We provide routine diagnostic tools, which are computationally inexpensive. The diagnostics are functions of basic building blocks: studentized residuals, error contrast matrix, and the inverse of the res...

متن کامل

Linear Regression Diagnostics in Cluster Samples

An extensive set of diagnostics for linear regression models has been developed to handle nonsurvey data. The models and the sampling plans used for finite populations often entail stratification, clustering, and survey weights, which renders many of the standard diagnostics inappropriate. In this article we adapt some influence diagnostics that have been formulated for ordinary or weighted lea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002